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ABSTRACT . a _ 

’’Teaching the test” has been defined m terms of 

teaching those particular content knowledges or skills needed to 
answer the test items correctly. Evidence of several sorts examineu 
in this paper clearly indicates that New Century was teaching 
students in Providence, R.I., the Gates- MacGin it ie Reading Test, 
which was used to assess their vocabulary achievement. The 
coincidence between vocabulary taught in the instructional package 
and the vocabulary required to respond correctly to test items on the 
Gates-MacGinitie was determined to be much greater than could be 
attributed to chance, and the data showed that the teaching program 
needed be only moderately effective to improve substantially student 
qains in grade-equivalent scores in the test* On the basi:=» of ihe 
analyses summarized in the paper, if the instructional materials are 
only 30 percent effective, scores should average nearly twice those 
which would normally be found. (MBM) 
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INATING IT POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDO- 

; CATION POSITION OR POLICY. 

I. Background 

. a student ' s performance on a standardized test is useful to 

us in making statements about Iris level of achievement only insofar as 
the test items have some congruence with our expectations about what a 

i 

student should know at any particular time- For many reasons, vocab- 
ulary subtests provide a better vehicle for discussing this issue than 
do most others. Consider the findings that students at the end of 
second grade typically have a sight vocabulary of 1,000-1. KJ words. 

Obviously, we could ascertain the extent to which a partic ar student 
meets this standard if we presented him with all the words which experts 

I • , 

have agreed upon as rightly being included in this domain Such an * r • ' • 

assessment procedure would enable us to make fairly precise statements 
about the student's level of accomplishment. However, the demands 
both physical and psychological — on students, teachers, and tester of 
' such a procedure would be unreasonable. 

As an alternative, then, ve sample from the vocabulary items 
in this domain some smaller number to which we ask students to respond. 

(Of course, some of the items in this vocabulary do not lend themselves 
to "testing 11 in any convenient way; e.g. , n a, M "the, 1 *or, X. ) . In 
the process of developing a standardized achievement tost for vocab- 
ulary, many items are considered for inclusion; a smaller number are 
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actually used in preliminary versions of the test; and a still smaller 
number appear in the final, published form(s) of the test. Those items 
which are retained are those which contribute most to the overall reli- 
ability and validity of the test. 

The items which actually appear in a standardized test, then, 

are but a sample of the items which a student might reasonably be 
expected to knew and on which he might reasonably be tested. It is on 
the basis of a student's performance on this sample of appropriate 
behaviors that we make inferences about his level of achievement in 
the domain of interest (vocabulary, in this case). Insofar as perform- 
ance on the test may be considered as representative of what the 
student might be expected to do when exposed to that larger collection 
of behavior samples from which the test items were selected, that test 
performance is a valid indicator of his achievement level. 

When the instructional process is such that the p articu l ar , 
knowledges or skills required for successful performance on the 
particular test form(s) to be utilized are in fact specifically taught, 
the behaviors sampled in the test are no longer representative of the 
domain to which we wish to generalize. Thus, the most crucial consid- 
eration in x<fh ether "teaching the test" has occurred is whether the 
instructional content is of such a form as to render the test and 
consequently normative inferences based on the test performance 
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invalid as an indicate!* of the general body of knowledge to which infer- 

if 



/ 



erces are to be. made. 5 * 

It is on this basis, then, that the possibility that the con 

/ 

tractor. New .Century, has violated that provision of the conf - act in 

which the contractor "agrees that it will not teach the Gat - -MacGinitie 

/ 

reading test..." (Section 20 of the contract agreement) should be 
evaluated. ’ •• • t 



IX. Analysis 

A student's vocabulary achievement is assessed, on any version 
of the Gates-MacGinitie Reading Test: Vocabulary, on the basis of his 



responses to anywhere from 48 (Level B, Grade 2) to 52 (Level C, Grade 3) 
items. The general form of these items is r ost easily represented in 
terms of what psychologists call a "paired— associates task. That is, 
given a "stimulus" word (such as "incredible") , the student must asso- 
ciate with if. some "response" word (such as "unbelievable") . As an 
illustration, consider item 30 from Primary C, I'orm 1 of the. Gates- 
MacGinitie: • . . . 



*up re p ar £ n g" students for a test can take several forms: providing. 

them with practice in the test-taking situation by giving them expe- 
; rience with the item forms (but not the content) they will encounter 
• . on the test, providing them exposure to the specific content which 

they will encounter on the test, giving them experience with both the 
content and form of the test, and coaching them on the specific items 
from the test in the form in which they actually appear. The first of 
these is a legitimate form or preparation in that it tends to reduce 
the contribution of extraneous, situationally linked factors which are 
irrelevant to achievement in the domain of interest but which might 

affect performance on the test. The remaining three procedures are 

illegitimate (in that they invalidate the test as a representative 
sample of the behavior domain to which inferences are to be made) , with 
the last being the most blatant and dishonest attempt to invalidate 
the test and inappropriately enhance student performance. 

'3 . . • • 
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30. medicine • 

meadov: ■ , 

iron . - 

. spider 

• drug 

■e, the student is presented with the stimulus word "medicine." The 
■rect response is "drug." If "teaching the test" has occurred, we 
ad expect to find this same pairing of words in Unit S of Word 
zards. -In fact, item 30 of part 6 of Unit S of Word Vizards is: 

- 30. A drug you take when you're sick is 

a. brook 

. b . medicine 

c . hurry 

this instance, then, the same words are paired in the instructional 
terials as are used in assessing student vocabulary on the Gates- 
cGinitie Reading Test. 

Granting that some common paired associates appearing in the 
ites-MacGinitie will appear in an instr ..... ; such as. Word 

zards simply by chance (and the contractor is not to be penalized 
>r such "chance coincidence''), the question remains: I s the. coincidence 

itween vocabulary taught in Word Wizards and voc abulary required . t o 

?-snond correctly to test items oil the G a tes-MacGinitie attributable t o 

« 

~iance? • • 

If a definition of "coincidence" based on the "paired-associate 
oncep t ualir a tion described above is adopted, the results presented in 
able 1 are obtained. Without reference to any other evidence, these 



ita are conclusive: /" 



Insert Tabin 1 about here 
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TABLE 1 



V 



Analysis o£ the Relationship o£ Instructional Content (t/ord Vizards) 
to Test Items (Gates-MacGinitic Heading Test: Vocabulary) 



Test 



Test 



No. of 



Word 

Wizards 



No. of 
Items r 



No. oi 
I terns 
Common^ 



Grade 

2 


Level 

B 


i ( orm 

1 

2 


ueiiia 

4B C 

48 c 


R 


120 


i 31 


3 


1 c 


1 

2 


12+4 0 d 
12+40 


S 


120 


/ 3+2 0 d 

5+20 


4-6 




1 

2 


50 

50 


T 


120 


26 

26 

0 






3 


50 






7-9 


E 


2 


50 


U 


120 


29 



a Number of stimulus words taught in the Unit. 

fc items d- which he "correct" stimulus-response pairs from the test 
appear in Wopd i’/izcci'ds. 

^he items in Level B are picture-stimulus/word-response items. • At 

This level, the correspondence of test items and instructional content 
was assessed in terms of whether or not the meanings of the =S|||£| 
res ponse words for the test were presented m Unit R of Wora Wvzaras. 

d Form C-l and Form C-2 each contain 12 picture/word items (like those 
in Level B) and 40 word-stimulus /word-response items of the type 
illustrated in the "medicine/drug" example in the text . For lormC , 
three of the 12 picture/word items and 20 of the 40 word/word item 
.. occur i n Word Wizards ; for Form C-2, the corresponding numbers- are „ 

. and 20. 
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1. The commonality for every test form ex amined, 

except D-3 is much greater than would be expected 
by chance (especially at the upper grade levels, 
where the child’s vocabulary should be comprised 
of several thousand words*) . 




2. Additionally, the same degree of overlap occurs 
for both forms at grade two (B-l and B-2) , for 
the word /word items on both forms at grade three 
(C-l and C-2) , and for two of the three forms 

at grades four through six (T)-l and R-2) . 

3, Equally telling is the complete lack of coramon- 
t ality between Unit T of Word Wizards and test 

form D-3. 



Supplementary data provided by Judith E. Barry in her July 13th memo 



randum to Dr. Bernardo and Mr. Kramer support the findings presented 



above. Not only did she find an extremely high overlap between Word 
Wizards forms of the Gates-MacGinitie but, in addition, her anal-, 

ysis, while proceeding on a slightly different basis from the one 
reported here,' also provides comparative data relating Word Wizards to 
other standardized reading tests. In general, her figures indicate a 
commonality of less than twenty percent for most tests other than the 
Gates-MacGinitie . The Barry memorandum also highlights the relatively 
uncommon words (based on standard lists of common words) which occur 
both on the Gates-MacGinitie tests and in Word Wizards. 



The conclusion is inescapable: New Century, through its 

Word Wizards materials, was teaching the word associations required to 
respond correctly to items on the Gates-MacGinitie Reading Test: 
Vocabulary. The coincidence between teaching materials and test items 
cannot be attributed to "chance." If the Word Wizards program was at 
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*Even at the end of grade two, 
words or more. 



typical sight vocabulary is 1,000 
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all effective , the results from the Gates— MacGini tie have been inval— 

/ 

idated as indicants of vocabulary achievement for contract-program 



pupils . 



In particular, what are the possible effects of exposure to 

/’ 

the content of Word Wizards on the change in grade-equivalent scores 

for students in the contract program? One approach to this question 

• ■ ■ • ■ . • • . i 

is i n terms of "typical" students of various sorts. For example, 

* j 

consider a second grader who was performing at an "average" level upon 



entry into the program in December. He would, obtain a. raw score of 
25-26 on one of the forms of Level B of the Gates-MacGinitie (Vocabulary) , 

i 

giving him a grade equivalent score (GE) of 2.3 years. If he remains 
an "average" student at the time of the exit test in May, he will ootain 
a GE score cf 2.9 (a raw sccre of- 34) . Thus, this student will have 
gained 8-9 items and 0.6 years. Assume that one wishes this student to 
show a gain of at least 1.0 years GE, to a terminal level of 3.3 years. 
For Level B of the Gates-MacGinitie, a GE score of 3.3 years corresponds 
to a raw score of 36 so improving the student’s performance by only two 
items will result in a GE increase of 0.4 years. 

To relate this potential improvement to the possible effects 
of exposure to Word Wizards is the next task. If a* typical second-grade 
student will in fact correctly answer 34 of the 48 vocabulary items from 
either form cf Level B of the Gates-MacGinitie without any special inter- 
vention, there are only 14 items remaining on which he could show further 
improvement. If we assume that the proportion of those 14 items included 

i 

in Word Wizards is the same as the proportion of all 48 items appearing 
in Word Wizards (31/48 or .65), then the student v?ill have been exposed 
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to nine of the items he would ordinarily have missed. -He needs only 
recall two of those nine items in order to improve his apparent vocab- 
ulary growth during the contract period from 0.6 years GE to 1.0 years 
GE. These figures represent an effectiveness ra te for the Word Wizards 

program of only 22 percent. / . ■■/ .•• . 

•. ' ' •/ .: •• • - •. Insert Table 2 about here 
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\ i ' - Similar arguments could be developed for "typical” students 

in grades three through eight and for other student groups in all 
grades. Table 2 presents just such an analysis. In additxon to 
"average” students (those at the fiftieth percentile in the norm dis- 
tribution), students ranking at the sixteenth (one standard deviation 
/below the norm group mean), the thirty-first (one-half standard devia- 
tion below the norm group mean) , and the sixty-ninth (one-half standard 
deviation above the norm group naan) percentiles have also been included 
! in this Table. On the basis of the analyses summarized in Table 2, one 
'can conclude that, if the instructional materials (the Word Wizards 
program) are only thirty' percent effective on the average, student gains 
in grade-equivalent scores should average nearly twice the magnitude of 
■'those which would ordinarily be found. 

III. Synopsis ... . . • \ • ' 

. • _ •■’X'each ing the test” has been defined in terms of teaching 

- those particular content knowledges or skills needed to answer the test 
items correctly. Converging evidence of several sorts clearly indicates 
that Hew Century was teaching the Gates-MacGinitie Reading Test. An 
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Grade 



Percentile 

Rank 
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16 
31 1 
50 

69 

16 

31 

50 

69 

16 

31 

50 

69 

16 

31 

50 

69 

16 

31 

50 

69 

16 

31 

50 

69 

16 

31 

50 

69 



of Possible 


Impact of 


Exposure to 


Word Wizards 




; Scores on 


the Gates- 


MacGinitxe Reading Test: 


Vocabulary 


for Selected Subgroups of Students 








Additional 










X terns for 


Percent 


Needed 


"Normal" Performance 


1.0 Year's 


in Word 


Effectiveness 


December 


May 


Gain 


Wizards 


Rate 


15 


21 


7 


65 


41 


20 


27 


5 


65 


39 


26 


34 


2 


- -65 


22 


33 


41 


— 


65 


0 


19 


23 


6 


46 


46 


25 


29 


6 


46 


55 


30 


35 


3 


46 


37 


35 


40 


■n 

J- 


46 


20 


14 


17 


5 


52 


29 


18 


22 


3 


52 


21 


22 


26 


3 


52 


25 


27 


30 • 


2 


52 


27 


19 


21 


6 • 


- 52 . 


55 


24 


26 


4 


52 


33 


29 


31 


2 


52 


22 


33 . 


35 


1 


52 


14 


23 


26 


4 


52 


31 


28 


30 


3 


52 


30 


33 


34-35 


1-2 


52 


25 


35-36 


38 


1 


52 


17 


13 


14 


3 


58 


15 


16-17 


17-18 


2 


58 


11 


20 


21 


2 


58 


12 


24 


25 


3 


58 


21 


15-16 


16-17 


2 


58 


11 


19 


20 


2 


58 


11 


23 


24 


3 


58 


20 


27 


28 


1 


58 
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analysis of the normative data for the test shows that the teaching 
program (? !ovd Wizards) need be only moderately effective to improve 
student gains in grade-equivalent scores on that test substantially. 
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